Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian-Vector Multiply
نویسندگان
چکیده
The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer iteration to solve the associated “overdetermined” nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter’s implicit sparse Hessian matrix-vector multiply algorithm to construct the Krylov subspaces used to solve for the truncated Newton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs.
منابع مشابه
On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
This paper describes a method of dogleg trust-region steps, or restricted Levenberg-Marquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative algorithm for solving the linearized Gauss-Newton normal equation, whereas the outer nonlinear algor...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملArnoldi-based Sampling for High-dimensional Optimization using Imperfect Data
We present a sampling strategy suitable for optimization problems characterized by high-dimensional design spaces and noisy outputs. Such outputs can arise, for example, in time-averaged objectives that depend on chaotic states. The proposed sampling method is based on a generalization of Arnoldi’s method used in Krylov iterative methods. We show that Arnoldi-based sampling can effectively esti...
متن کاملBlock Krylov Space Methods for Linear Systems with Multiple Right-hand Sides: an Introduction
In a number of applications in scientific computing and engineering one has to solve huge sparse linear systems of equations with several right-hand sides that are given at once. Block Krylov space solvers are iterative methods that are especially designed for such problems and have fundamental advantages over the corresponding methods for systems with a single right-hand side: much larger sear...
متن کاملRecursion Relations for the Extended Krylov Subspace Method
Abstract. The evaluation of matrix functions of the form f(A)v, where A is a large sparse or structured symmetric matrix, f is a nonlinear function, and v is a vector, is frequently subdivided into two steps: first an orthonormal basis of an extended Krylov subspace of fairly small dimension is determined, and then a projection onto this subspace is evaluated by a method designed for small prob...
متن کامل